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     Differentiated Services (Diffserv) and Real-Time Communication

Abstract

   This memo describes the interaction between Differentiated Services
   (Diffserv) network quality-of-service (QoS) functionality and real-
   time network communication, including communication based on the
   Real-time Transport Protocol (RTP).  Diffserv is based on network
   nodes applying different forwarding treatments to packets whose IP
   headers are marked with different Diffserv Codepoints (DSCPs).
   WebRTC applications, as well as some conferencing applications, have
   begun using the Session Description Protocol (SDP) bundle negotiation
   mechanism to send multiple traffic streams with different QoS
   requirements using the same network 5-tuple.  The results of using
   multiple DSCPs to obtain different QoS treatments within a single
   network 5-tuple have transport protocol interactions, particularly
   with congestion control functionality (e.g., reordering).  In
   addition, DSCP markings may be changed or removed between the traffic
   source and destination.  This memo covers the implications of these
   Diffserv aspects for real-time network communication, including
   WebRTC.

Status of This Memo

   This document is not an Internet Standards Track specification; it is
   published for informational purposes.

   This document is a product of the Internet Engineering Task Force
   (IETF).  It represents the consensus of the IETF community.  It has
   received public review and has been approved for publication by the
   Internet Engineering Steering Group (IESG).  Not all documents
   approved by the IESG are a candidate for any level of Internet
   Standard; see Section 2 of RFC 5741.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   http://www.rfc-editor.org/info/rfc7657.
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1.  Introduction

   This memo describes the interactions between Differentiated Services
   (Diffserv) network quality-of-service (QoS) functionality [RFC2475]
   and real-time network communication, including communication based on
   the Real-time Transport Protocol (RTP) [RFC3550].  Diffserv is based
   on network nodes applying different forwarding treatments to packets
   whose IP headers are marked with different Diffserv Codepoints
   (DSCPs) [RFC2474].  In the past, distinct RTP streams have been sent
   over different transport-level flows, sometimes multiplexed with the
   RTP Control Protocol (RTCP).  WebRTC applications, as well as some
   conferencing applications, are now using the Session Description
   Protocol (SDP) [RFC4566] bundle negotiation mechanism [SDP-BUNDLE] to
   send multiple traffic streams with different QoS requirements using
   the same network 5-tuple.  The results of using multiple DSCPs to
   obtain different QoS treatments within a single network 5-tuple have
   transport protocol interactions, particularly with congestion control
   functionality (e.g., reordering).  In addition, DSCP markings may be
   changed or removed between the traffic source and destination.  This
   memo covers the implications of these Diffserv aspects for real-time
   network communication, including WebRTC traffic [WEBRTC-OVERVIEW].

   The memo is organized as follows.  Background is provided in
   Section 2 on real-time communications and Section 3 on Differentiated
   Services.  Section 4 describes some examples of Diffserv usage with
   real-time communications.  Section 5 explains how use of Diffserv
   features interacts with both transport and real-time communications
   protocols and Section 6 provides guidance on Diffserv feature usage
   to control undesired interactions.  Security considerations are
   discussed in Section 7.

2.  Real-Time Communications

   Real-time communications enables communication in real time over an
   IP network using voice, video, text, content sharing, etc.  It is
   possible to use more than one of these modes concurrently to provide
   a rich communication experience.

   A simple example of real-time communications is a voice call placed
   over the Internet where an audio stream is transmitted in each
   direction between two users.  A more complex example is an immersive
   videoconferencing system that has multiple video screens, multiple
   cameras, multiple microphones, and some means of sharing content.
   For such complex systems, there may be multiple media and non-media
   streams transmitted via a single IP address and port or via multiple
   IP addresses and ports.
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2.1.  RTP Background

   The most common protocol used for real-time media is RTP [RFC3550].
   RTP defines a common encapsulation format and handling rules for
   real-time data transmitted over the Internet.  Unfortunately, RTP
   terminology usage has been inconsistent.  For example, RFC 7656
   [RFC7656] on RTP terminology observes that:

      RTP [RFC3550] uses media stream, audio stream, video stream, and a
      stream of (RTP) packets interchangeably, which are all RTP
      streams.

   Terminology in this memo is based on that RTP terminology document
   with the following terms being of particular importance (see that
   terminology document for full definitions):

   Source Stream:  A reference clock synchronized, time progressing,
      digital media stream.

   RTP Stream:  A stream of RTP packets containing media data, which may
      be source data or redundant data.  The RTP stream is identified by
      an RTP synchronization source (SSRC) belonging to a particular RTP
      session.  An RTP stream may be a secured RTP stream when RTP-based
      security is used.

   In addition, this memo follows [RFC3550] in using the term "SSRC" to
   designate both the identifier of an RTP stream and the entity that
   sends that RTP stream.

   Media encoding and packetization of a source stream results in a
   source RTP stream plus zero or more redundancy RTP streams that
   provide resilience against loss of packets from the source RTP stream
   [RFC7656].  Redundancy information may also be carried in the same
   RTP stream as the encoded source stream, e.g., see Section 7.2 of
   [RFC5109].  With most applications, a single media type (e.g., audio)
   is transmitted within a single RTP session.  However, it is possible
   to transmit multiple, distinct source streams over the same RTP
   session as one or more individual RTP streams.  This is referred to
   as RTP multiplexing.  In addition, an RTP stream may contain multiple
   source streams, e.g., components or programs in an MPEG Transport
   Stream [H.221].

   The number of source streams and RTP streams in an overall real-time
   interaction can be surprisingly large.  In addition to a voice source
   stream and a video source stream, there could be separate source
   streams for each of the cameras or microphones on a videoconferencing
   system.  As noted above, there might also be separate redundancy RTP
   streams that provide protection to a source RTP stream, using
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   techniques such as forward error correction.  Another example is
   simulcast transmission, where a video source stream can be
   transmitted as high resolution and low resolution RTP streams at the
   same time.  In this case, a media processing function might choose to
   send one or both RTP streams onward to a receiver based on bandwidth
   availability or who the active speaker is in a multipoint conference.
   Lastly, a transmitter might send the same media content concurrently
   as two RTP streams using different encodings (e.g., video encoded as
   VP8 [RFC6386] in parallel with H.264 [H.264]) to allow a media
   processing function to select a media encoding that best matches the
   capabilities of the receiver.

   For the WebRTC protocol suite [WEBRTC-TRANSPORTS], an individual
   source stream is a MediaStreamTrack, and a MediaStream contains one
   or more MediaStreamTracks [W3C.WD-mediacapture-streams-20130903].  A
   MediaStreamTrack is transmitted as a source RTP stream plus zero or
   more redundant RTP streams, so a MediaStream that consists of one
   MediaStreamTrack is transmitted as a single source RTP stream plus
   zero or more redundant RTP streams.  For more information on use of
   RTP in WebRTC, see [RTP-USAGE].

   RTP is usually carried over a datagram protocol, such as UDP
   [RFC768], UDP-Lite [RFC3828], or the Datagram Congestion Control
   Protocol (DCCP) [RFC4340]; UDP is most commonly used, but a non-
   datagram protocol (e.g., TCP [RFC793]) may also be used.  Transport
   protocols other than UDP or UDP-Lite may also be used to transmit
   real-time data or near-real-time data.  For example, the Stream
   Control Transmission Protocol (SCTP) [RFC4960] can be utilized to
   carry application-sharing or whiteboarding information as part of an
   overall interaction that includes real-time media.  These additional
   transport protocols can be multiplexed with an RTP session via UDP
   encapsulation, thereby using a single pair of UDP ports.

   The WebRTC protocol suite encompasses a number of forms of
   multiplexing:

   1.  Individual source streams are carried in one or more individual
       RTP streams.  These RTP streams can be multiplexed onto a single
       transport-layer flow or sent as separate transport-layer flows.
       This memo only considers the case where the RTP streams are to be
       multiplexed onto a single transport-layer flow, forming a single
       RTP session as described in [RFC3550];

   2.  RTCP (see [RFC3550]) may be multiplexed onto the same transport-
       layer flow as the RTP streams with which it is associated, as
       described in [RFC5761], or it may be sent on a separate
       transport-layer flow;
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   3.  An RTP session could be multiplexed with a single SCTP
       association over Datagram Transport Layer Security (DTLS) and
       with both Session Traversal Utilities for NAT (STUN) [RFC5389]
       and TURN [RFC5766] traffic into a single transport-layer flow as
       described in [RFC5764] with the updates in [SRTP-DTLS].  The STUN
       [RFC5389] and Traversal Using Relays around NAT (TURN) [RFC5766]
       protocols provide NAT/FW (Network Address Translator / Firewall)
       traversal and port mapping.

   The resulting transport-layer flow is identified by a network
   5-tuple, i.e., a combination of two IP addresses (source and
   destination), two ports (source and destination), and the transport
   protocol used (e.g., UDP).  SDP bundle negotiation restrictions
   [SDP-BUNDLE] limit WebRTC to using at most a single DTLS session per
   network 5-tuple.  In contrast to WebRTC use of a single SCTP
   association with DTLS, multiple SCTP associations can be directly
   multiplexed over a single UDP 5-tuple as specified in [RFC6951].

   The STUN and TURN protocols were originally designed to use UDP as a
   transport; however, TURN has been extended to use TCP as a transport
   for situations in which UDP does not work [RFC6062].  When TURN
   selects use of TCP, the entire real-time communications session is
   carried over a single TCP connection (i.e., 5-tuple).

   For IPv6, addition of the flow label [RFC6437] to network 5-tuples
   results in network 6-tuples (or 7-tuples for bidirectional flows),
   but in practice, use of a flow label is unlikely to result in a
   finer-grain traffic subset than the corresponding network 5-tuple
   (e.g., the flow label is likely to represent the combination of two
   ports with use of the UDP protocol).  For that reason, discussion in
   this document focuses on UDP 5-tuples.

2.2.  RTP Multiplexing

   Section 2.1 explains how source streams can be multiplexed in a
   single RTP session, which can in turn be multiplexed over UDP with
   packets generated by other transport protocols.  This section
   provides background on why this level of multiplexing is desirable.
   The rationale in this section applies both to multiplexing of source
   streams in a single RTP session and multiplexing of an RTP session
   with traffic from other transport protocols via UDP encapsulation.

   Multiplexing reduces the number of ports utilized for real-time and
   related communication in an overall interaction.  While a single
   endpoint might have plenty of ports available for communication, this
   traffic often traverses points in the network that are constrained on
   the number of available ports or whose performance degrades as the
   number of ports in use increases.  A good example is a NAT/FW device
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   sitting at the network edge.  As the number of simultaneous protocol
   sessions increases, so does the burden placed on these devices to
   provide port mapping.

   Another reason for multiplexing is to help reduce the time required
   to establish bidirectional communication.  Since any two
   communicating users might be situated behind different NAT/FW
   devices, it is necessary to employ techniques like STUN and TURN
   along with Interactive Connectivity Establishment (ICE) [RFC5245] to
   get traffic to flow between the two devices [WEBRTC-TRANSPORTS].
   Performing the tasks required by these protocols takes time,
   especially when multiple protocol sessions are involved.  While tasks
   for different sessions can be performed in parallel, it is
   nonetheless necessary for applications to wait for all sessions to be
   opened before communication between two users can begin.  Reducing
   the number of STUN/ICE/TURN steps reduces the likelihood of loss of a
   packet for one of these protocols; any such loss adds delay to
   setting up a communication session.  Further, reducing the number of
   STUN/ICE/TURN tasks places a lower burden on the STUN and TURN
   servers.

   Multiplexing may reduce the complexity and resulting load on an
   endpoint.  A single instance of STUN/ICE/TURN is simpler to execute
   and manage than multiple instances STUN/ICE/TURN operations happening
   in parallel, as the latter require synchronization and create more
   complex failure situations that have to be cleaned up by additional
   code.

3.  Differentiated Services (Diffserv)

   The Diffserv architecture [RFC2475][RFC4594] is intended to enable
   scalable service discrimination in the Internet without requiring
   each node in the network to store per-flow state and participate in
   per-flow signaling.  The services may be end to end or within a
   network; they include both those that can satisfy quantitative
   performance requirements (e.g., peak bandwidth) and those based on
   relative performance (e.g., "class" differentiation).  Services can
   be constructed by a combination of well-defined building blocks
   deployed in network nodes that:

   o  classify traffic and set bits in an IP header field at network
      boundaries or hosts,

   o  use those bits to determine how packets are forwarded by the nodes
      inside the network, and

   o  condition the marked packets at network boundaries in accordance
      with the requirements or rules of each service.
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   Traffic conditioning may include changing the DSCP in a packet
   (remarking it), delaying the packet (as a consequence of traffic
   shaping), or dropping the packet (as a consequence of traffic
   policing).

   A network node that supports Diffserv includes a classifier that
   selects packets based on the value of the DS field in IP headers (the
   Diffserv codepoint or DSCP), along with buffer management and packet
   scheduling mechanisms capable of delivering the specific packet
   forwarding treatment indicated by the DS field value.  Setting of the
   DS field and fine-grain conditioning of marked packets need only be
   performed at network boundaries; internal network nodes operate on
   traffic aggregates that share a DS field value, or in some cases, a
   small set of related values.

   The Diffserv architecture [RFC2475] maintains distinctions among:

   o  the QoS service provided to a traffic aggregate,

   o  the conditioning functions and per-hop behaviors (PHBs) used to
      realize services,

   o  the DSCP in the IP header used to mark packets to select a per-hop
      behavior, and

   o  the particular implementation mechanisms that realize a per-hop
      behavior.

   This memo focuses on PHBs and the usage of DSCPs to obtain those
   behaviors.  In a network node's forwarding path, the DSCP is used to
   map a packet to a particular forwarding treatment, or to a per-hop
   behavior (PHB) that specifies the forwarding treatment.

   The specification of a PHB describes the externally observable
   forwarding behavior of a network node for network traffic marked with
   a DSCP that selects that PHB.  In this context, "forwarding behavior"
   is a general concept - for example, if only one DSCP is used for all
   traffic on a link, the observable forwarding behavior (e.g., loss,
   delay, jitter) will often depend only on the loading of the link.  To
   obtain useful behavioral differentiation, multiple traffic subsets
   are marked with different DSCPs for different PHBs for which node
   resources such as buffer space and bandwidth are allocated.  PHBs
   provide the framework for a Diffserv network node to allocate
   resources to traffic subsets, with network-scope Differentiated
   Services constructed on top of this basic hop-by-hop resource
   allocation mechanism.
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   The codepoints (DSCPs) may be chosen from a small set of fixed values
   (the class selector codepoints), from a set of recommended values
   defined in PHB specifications, or from values that have purely local
   meanings to a specific network that supports Diffserv; in general,
   packets may be forwarded across multiple such networks between source
   and destination.

   The mandatory DSCPs are the class selector codepoints as specified in
   [RFC2474].  The class selector codepoints (CS0-CS7) extend the
   deprecated concept of IP Precedence in the IPv4 header; three bits
   are added, so that the class selector DSCPs are of the form 'xxx000'.
   The all-zero DSCP ('000000' or CS0) is always assigned to a Default
   PHB that provides best-effort forwarding behavior, and the remaining
   class selector codepoints are intended to provide relatively better
   per-hop-forwarding behavior in increasing numerical order, but:

   o  A network endpoint cannot rely upon different class selector
      codepoints providing Differentiated Services via assignment to
      different PHBs, as adjacent class selector codepoints may use the
      same pool of resources on each network node in some networks.
      This generalizes to ranges of class selector codepoints, but with
      limits -- for example, CS6 and CS7 are often used for network
      control (e.g., routing) traffic [RFC4594] and hence are likely to
      provide better forwarding behavior under network load to
      prioritize network recovery from disruptions.  There is no
      effective way for a network endpoint to determine which PHBs are
      selected by the class selector codepoints on a specific network,
      let alone end to end.

   o  CS1 ('001000') was subsequently designated as the recommended
      codepoint for the Lower Effort (LE) PHB [RFC3662].  An LE service
      forwards traffic with "lower" priority than best effort and can be
      "starved" by best-effort and other "higher" priority traffic.  Not
      all networks offer an LE service, hence traffic marked with the
      CS1 DSCP may not receive lower effort forwarding; such traffic may
      be forwarded with a different PHB (e.g., the Default PHB),
      remarked to another DSCP (e.g., CS0) and forwarded accordingly, or
      dropped.  A network endpoint cannot rely upon the presence of an
      LE service that is selected by the CS1 DSCP on a specific network,
      let alone end to end.  Packets marked with the CS1 DSCP may be
      forwarded with best-effort service or another "higher" priority
      service; see [RFC2474].  See [RFC3662] for further discussion of
      the LE PHB and service.
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3.1.  Diffserv Per-Hop Behaviors (PHBs)

   Although Differentiated Services is a general architecture that may
   be used to implement a variety of services, three fundamental
   forwarding behaviors (PHBs) have been defined and characterized for
   general use.  These are:

   1.  Default Forwarding (DF) for elastic traffic [RFC2474].  The
       Default PHB is always selected by the all-zero DSCP and provides
       best-effort forwarding.

   2.  Assured Forwarding (AF) [RFC2597] to provide Differentiated
       Service to elastic traffic.  Each instance of the AF behavior
       consists of three PHBs that differ only in drop precedence, e.g.,
       AF11, AF12, and AF13; such a set of three AF PHBs is referred to
       as an AF class, e.g., AF1x.  There are four defined AF classes,
       AF1x through AF4x, with higher numbered classes intended to
       receive better forwarding treatment than lower numbered classes.
       Use of multiple PHBs from a single AF class (e.g., AF1x) does not
       enable network traffic reordering within a single network
       5-tuple, although such reordering may occur for other transient
       reasons (e.g., routing changes or ECMP rebalancing).

   3.  Expedited Forwarding (EF) [RFC3246] intended for inelastic
       traffic.  Beyond the basic EF PHB, the VOICE-ADMIT PHB [RFC5865]
       is an admission-controlled variant of the EF PHB.  Both of these
       PHBs are based on preconfigured limited forwarding capacity;
       traffic in excess of that capacity is expected to be dropped.

3.2.  Traffic Classifiers and DSCP Remarking

   DSCP markings are not end to end in general.  Each network can make
   its own decisions about what PHBs to use and which DSCP maps to each
   PHB.  While every PHB specification includes a recommended DSCP, and
   RFC 4594 [RFC4594] recommends their end-to-end usage, there is no
   requirement that every network support any PHBs (aside from the
   Default PHB for best-effort forwarding) or use any specific DSCPs,
   with the exception of the support requirements for the class selector
   codepoints (see RFC 2474 [RFC2474]).  When Diffserv is used, the edge
   or boundary nodes of a network are responsible for ensuring that all
   traffic entering that network conforms to that network's policies for
   DSCP and PHB usage, and such nodes may change DSCP markings on
   traffic to achieve that result.  As a result, DSCP remarking is
   possible at any network boundary, including the first network node
   that traffic sent by a host encounters.  Remarking is also possible
   within a network, e.g., for traffic shaping.
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   DSCP remarking is part of traffic conditioning; the traffic
   conditioning functionality applied to packets at a network node is
   determined by a traffic classifier [RFC2475].  Edge nodes of a
   Diffserv network classify traffic based on selected packet header
   fields; typical implementations do not look beyond the traffic's
   network 5-tuple in the IP and transport protocol headers (e.g., for
   SCTP or RTP encapsulated in UDP, header-based classification is
   unlikely to look beyond the outer UDP header).  As a result, when
   multiple DSCPs are used for traffic that shares a network 5-tuple,
   remarking at a network boundary may result in all of the traffic
   being forwarded with a single DSCP, thereby removing any
   differentiation within the network 5-tuple downstream of the
   remarking location.  Network nodes within a Diffserv network
   generally classify traffic based solely on DSCPs, but may perform
   finer-grain traffic conditioning similar to that performed by edge
   nodes.

   So, for two arbitrary network endpoints, there can be no assurance
   that the DSCP set at the source endpoint will be preserved and
   presented at the destination endpoint.  Rather, it is quite likely
   that the DSCP will be set to zero (e.g., at the boundary of a network
   operator that distrusts or does not use the DSCP field) or to a value
   deemed suitable by an ingress classifier for whatever network 5-tuple
   it carries.

   In addition, remarking may remove application-level distinctions in
   forwarding behavior - e.g., if multiple PHBs within an AF class are
   used to distinguish different types of frames within a video RTP
   stream, token-bucket-based remarkers operating in color-blind mode
   (see [RFC2697] and [RFC2698] for examples) may remark solely based on
   flow rate and burst behavior, removing the drop precedence
   distinctions specified by the source.

   Backbone and other carrier networks may employ a small number of
   DSCPs (e.g., less than half a dozen) to manage a small number of
   traffic aggregates; hosts that use a larger number of DSCPs can
   expect to find that much of their intended differentiation is removed
   by such networks.  Better results may be achieved when DSCPs are used
   to spread traffic among a smaller number of Diffserv-based traffic
   subsets or aggregates; see [DIFFSERV-INTERCON] for one proposal.
   This is of particular importance for MPLS-based networks due to the
   limited size of the Traffic Class (TC) field in an MPLS label
   [RFC5462] that is used to carry Diffserv information and the use of
   that TC field for other purposes, e.g., Explicit Congestion
   Notification (ECN) [RFC5129].  For further discussion on use of
   Diffserv with MPLS, see [RFC3270] and [RFC5127].
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4.  Examples

   For real-time communications, one might want to mark the audio
   packets using EF and the video packets as AF41.  However, a video
   conference receiving the audio packets significantly ahead of the
   video is not useful because lip sync is necessary between audio and
   video.  It may still be desirable to send audio with a PHB that
   provides better service, because more reliable arrival of audio helps
   assure smooth audio rendering, which is often more important than
   fully faithful video rendering.  There are also limits, as some
   devices have difficulties in synchronizing voice and video when
   packets that need to be rendered together arrive at significantly
   different times.  It makes more sense to use different PHBs when the
   audio and video source streams do not share a strict timing
   relationship.  For example, video content may be shared within a
   video conference via playback, perhaps of an unedited video clip that
   is intended to become part of a television advertisement.  Such
   content sharing video does not need precise synchronization with
   video conference audio, and could use a different PHB, as content
   sharing video is more tolerant to jitter, loss, and delay.

   Within a layered video RTP stream, ordering of frame communication is
   preferred, but importance of frame types varies, making use of PHBs
   with different drop precedences appropriate.  For example, I-frames
   that contain an entire image are usually more important than P-frames
   that contain only changes from the previous image because loss of a
   P-frame (or part thereof) can be recovered (at the latest) via the
   next I-frame, whereas loss of an I-frame (or part thereof) may cause
   rendering problems for all of the P-frames that depend on the missing
   I-frame.  For this reason, it is appropriate to mark I-frame packets
   with a PHB that has lower drop precedence than the PHB used for
   P-frames, as long as the PHBs preserve ordering among frames (e.g.,
   are in a single AF class) - AF41 for I-frames and AF43 for P-frames
   is one possibility.  Additional spatial and temporal layers beyond
   the base video layer could also be marked with higher drop precedence
   than the base video layer, as their loss reduces video quality, but
   does not disrupt video rendering.

   Additional RTP streams in a real-time communication interaction could
   be marked with CS0 and carried as best-effort traffic.  One example
   is real-time text transmitted as specified in RFC 4103 [RFC4103].
   Best-effort forwarding suffices because such real-time text has loose
   timing requirements; RFC 4103 recommends sending text in chunks every
   300 ms.  Such text is technically real-time, but does not need a PHB
   promising better service than best effort, in contrast to audio or
   video.
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   A WebRTC application may use one or more RTP streams, as discussed
   above.  In addition, it may use an SCTP-based data channel
   [DATA-CHAN] whose QoS treatment depends on the nature of the
   application.  For example, best-effort treatment of data channels is
   likely to suffice for messaging, shared white board, and guided
   browsing applications, whereas latency-sensitive games might desire
   better QoS for their data channels.

5.  Diffserv Interactions

5.1.  Diffserv, Reordering, and Transport Protocols

   Transport protocols provide data communication behaviors beyond those
   possible at the IP layer.  An important example is that TCP [RFC793]
   provides reliable in-order delivery of data with congestion control.
   SCTP [RFC4960] provides additional properties such as preservation of
   message boundaries, and the ability to avoid head-of-line blocking
   that may occur with TCP.

   In contrast, UDP [RFC768] is a basic unreliable datagram protocol
   that provides port-based multiplexing and demultiplexing on top of
   IP.  Two other unreliable datagram protocols are UDP-Lite [RFC3828],
   a variant of UDP that may deliver partially corrupt payloads when
   errors occur, and DCCP [RFC4340], which provides a range of
   congestion control modes for its unreliable datagram service.

   Transport protocols that provide reliable delivery (e.g., TCP, SCTP)
   are sensitive to network reordering of traffic.  When a protocol that
   provides reliable delivery receives a packet other than the next
   expected packet, the protocol usually assumes that the expected
   packet has been lost and updates the peer, which often causes a
   retransmission.  In addition, congestion control functionality in
   transport protocols (including DCCP) usually infers congestion when
   packets are lost.  This creates additional sensitivity to significant
   network packet reordering, as such reordering may be (mis)interpreted
   as loss of the out-of-order packets, causing a congestion control
   response.

   This sensitivity to reordering remains even when ECN [RFC3168] is in
   use, as ECN receivers are required to treat missing packets as
   potential indications of congestion, because:

   o  Severe congestion may cause ECN-capable network nodes to drop
      packets, and

   o  ECN traffic may be forwarded by network nodes that do not support
      ECN and hence drop packets to indicate congestion.
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   Congestion control is an important aspect of the Internet
   architecture; see [RFC2914] for further discussion.

   In general, marking packets with different DSCPs results in different
   PHBs being applied at nodes in the network, making reordering very
   likely due to use of different pools of forwarding resources for each
   PHB.  This should not be done within a single network 5-tuple for
   current transport protocols, with the important exceptions of UDP and
   UDP-Lite.

   When PHBs that enable reordering are mixed within a single network
   5-tuple, the effect is to mix QoS-based traffic classes within the
   scope of a single transport protocol connection or association.  As
   these QoS-based traffic classes receive different network QoS
   treatments, they use different pools of network resources and hence
   may exhibit different levels of congestion.  The result for
   congestion-controlled protocols is that a separate instance of
   congestion control functionality is needed per QoS-based traffic
   class.  Current transport protocols support only a single instance of
   congestion control functionality for an entire connection or
   association; extending that support to multiple instances would add
   significant protocol complexity.  Traffic in different QoS-based
   classes may use different paths through the network; this complicates
   path integrity checking in connection- or association-based
   protocols, as those paths may fail independently.

   The primary example where usage of multiple PHBs does not enable
   reordering within a single network 5-tuple is use of PHBs from a
   single AF class (e.g., AF1x).  Traffic reordering within the scope of
   a network 5-tuple that uses a single PHB or AF class may occur for
   other transient reasons (e.g., routing changes or ECMP rebalancing).

   Reordering also affects other forms of congestion control, such as
   techniques for RTP congestion control that were under development
   when this memo was published; see [RMCAT-CC] for requirements.  These
   techniques prefer use of a common (coupled) congestion controller for
   RTP streams between the same endpoints to reduce packet loss and
   delay by reducing competition for resources at any shared bottleneck.

   Shared bottlenecks can be detected via techniques such as correlation
   of one-way delay measurements across RTP streams.  An alternate
   approach is to assume that the set of packets on a single network
   5-tuple marked with DSCPs that do not enable reordering will utilize
   a common network path and common forwarding resources at each network
   node.  Under that assumption, any bottleneck encountered by such
   packets is shared among all of them, making it safe to use a common
   (coupled) congestion controller (see [COUPLED-CC]).  This is not a
   safe assumption when the packets involved are marked with DSCP values
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   that enable reordering because a bottleneck may not be shared among
   all such packets (e.g., when the DSCP values result in use of
   different queues at a network node, but only one queue is a
   bottleneck).

   UDP and UDP-Lite are not sensitive to reordering in the network,
   because they do not provide reliable delivery or congestion control.
   On the other hand, when used to encapsulate other protocols (e.g., as
   UDP is used by WebRTC; see Section 2.1), the reordering
   considerations for the encapsulated protocols apply.  For the
   specific usage of UDP by WebRTC, every encapsulated protocol (i.e.,
   RTP, SCTP, and TCP) is sensitive to reordering as further discussed
   in this memo.  In addition, [RFC5405] provides general guidelines for
   use of UDP (and UDP-Lite); the congestion control guidelines in that
   document apply to protocols encapsulated in UDP (or UDP-Lite).

5.2.  Diffserv, Reordering, and Real-Time Communication

   Real-time communications are also sensitive to network reordering of
   packets.  Such reordering may lead to unneeded retransmission and
   spurious retransmission control signals (such as NACK) in reliable
   delivery protocols (see Section 5.1).  The degree of sensitivity
   depends on protocol or stream timers, in contrast to reliable
   delivery protocols that usually react to all reordering.

   Receiver jitter buffers have important roles in the effect of
   reordering on real-time communications:

   o  Minor packet reordering that is contained within a jitter buffer
      usually has no effect on rendering of the received RTP stream
      because packets that arrive out of order are retrieved in order
      from the jitter buffer for rendering.

   o  Packet reordering that exceeds the capacity of a jitter buffer can
      cause user-perceptible quality problems (e.g., glitches, noise)
      for delay-sensitive communication, such as interactive
      conversations for which small jitter buffers are necessary to
      preserve human perceptions of real-time interaction.  Interactive
      real-time communication implementations often discard data that is
      sufficiently late so that it cannot be rendered in source stream
      order, making retransmission counterproductive.  For this reason,
      implementations of interactive real-time communication often do
      not use retransmission.

   o  In contrast, replay of recorded media can tolerate significantly
      longer delays than interactive conversations, so replay is likely
      to use larger jitter buffers than interactive conversations.
      These larger jitter buffers increase the tolerance of replay to
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      reordering by comparison to interactive conversations.  The size
      of the jitter buffer imposes an upper bound on replay tolerance to
      reordering but does enable retransmission to be used when the
      jitter buffer is significantly larger than the amount of data that
      can be expected to arrive during the round-trip latency for
      retransmission.

   Network packet reordering has no effective upper bound and can exceed
   the size of any reasonable jitter buffer.  In practice, the size of
   jitter buffers for replay is limited by external factors such as the
   amount of time that a human is willing to wait for replay to start.

5.3.  Drop Precedence and Transport Protocols

   Packets within the same network 5-tuple that use PHBs within a single
   AF class can be expected to draw upon the same forwarding resources
   on network nodes (e.g., use the same router queue), and hence use of
   multiple drop precedences within an AF class is not expected to cause
   latency variation.  When PHBs within a single AF class are mixed
   within a flow, the resulting overall likelihood that packets will be
   dropped from that flow is a mix of the drop likelihoods of the PHBs
   involved.

   There are situations in which drop precedences should not be mixed.
   A simple example is that there is little value in mixing drop
   precedences within a TCP connection, because TCP's ordered delivery
   behavior results in any drop requiring the receiver to wait for the
   dropped packet to be retransmitted.  Any resulting delay depends on
   the RTT and not the packet that was dropped.  Hence a single DSCP
   should be used for all packets in a TCP connection.

   As a consequence, when TCP is selected for NAT/FW traversal (e.g., by
   TURN), a single DSCP should be used for all traffic on that TCP
   connection.  An additional reason for this recommendation is that
   packetization for STUN/ICE/TURN occurs before passing the resulting
   packets to TCP; TCP resegmentation may result in a different
   packetization on the wire, breaking any association between DSCPs and
   specific data to which they are intended to apply.

   SCTP [RFC4960] differs from TCP in a number of ways, including the
   ability to deliver messages in an order that differs from the order
   in which they were sent and support for unreliable streams.  However,
   SCTP performs congestion control and retransmission across the entire
   association, and not on a per-stream basis.  Although there may be
   advantages to using multiple drop precedence across SCTP streams or
   within an SCTP stream that does not use reliable ordered delivery,
   there is no practical operational experience in doing so (e.g., the
   SCTP sockets API [RFC6458] does not support use of more than one DSCP
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   for an SCTP association).  As a consequence, the impacts on SCTP
   protocol and implementation behavior are unknown and difficult to
   predict.  Hence a single DSCP should be used for all packets in an
   SCTP association, independent of the number or nature of streams in
   that association.  Similar reasoning applies to a DCCP connection; a
   single DSCP should be used because the scope of congestion control is
   the connection and there is no operational experience with using more
   than one DSCP.  This recommendation may be revised in the future if
   experiments, analysis, and operational experience provide compelling
   reasons to change it.

   Guidance on transport protocol design and implementation to provide
   support for use of multiple PHBs and DSCPs in a transport protocol
   connection (e.g., DCCP) or transport protocol association (e.g.,
   SCTP) is out of scope for this memo.

5.4.  Diffserv and RTCP

   RTCP [RFC3550] is used with RTP to monitor quality of service and
   convey information about RTP session participants.  A sender of RTCP
   packets that also sends RTP packets (i.e., originates an RTP stream)
   should use the same DSCP marking for both types of packets.  If an
   RTCP sender doesn't send any RTP packets, it should mark its RTCP
   packets with the DSCP that it would use if it did send RTP packets
   with media similar to the RTP traffic that it receives.  If the RTCP
   sender uses or would use multiple DSCPs that differ only in drop
   precedence for RTP, then it should use the DSCP with the least
   likelihood of drop for RTCP to increase the likelihood of RTCP packet
   delivery.

   If the SDP bundle extension [SDP-BUNDLE] is used to negotiate sending
   multiple types of media in a single RTP session, then receivers will
   send separate RTCP reports for each type of media, using a separate
   SSRC for each media type; each RTCP report should be marked with the
   DSCP corresponding to the type of media handled by the reporting
   SSRC.

   This guidance may result in different DSCP markings for RTP streams
   and RTCP receiver reports about those RTP streams.  The resulting
   variation in network QoS treatment by traffic direction is necessary
   to obtain representative round-trip time (RTT) estimates that
   correspond to the media path RTT, which may differ from the transport
   protocol RTT.  RTCP receiver reports may be relatively infrequent,
   and hence the resulting RTT estimates are of limited utility for
   transport protocol congestion control (although those RTT estimates
   have other important uses; see [RFC3550]).  For this reason, it is
   important that RTCP receiver reports sent by an SSRC receive the same
   network QoS treatment as the RTP stream being sent by that SSRC.
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6.  Guidelines

   The only use of multiple standardized PHBs and DSCPs that does not
   enable network reordering among packets marked with different DSCPs
   is use of PHBs within a single AF class.  All other uses of multiple
   PHBs and/or the class selector DSCPs enable network reordering of
   packets that are marked with different DSCPs.  Based on this and the
   foregoing discussion, the guidelines in this section apply to use of
   Diffserv with real-time communications.

   Applications and other traffic sources (including RTP SSRCs):

   o  Should limit use of DSCPs within a single RTP stream to those
      whose corresponding PHBs do not enable packet reordering.  If this
      is not done, significant network reordering may overwhelm
      implementation assumptions about reordering limits, e.g., jitter
      buffer size, causing poor user experiences (see Section 5.2).
      This guideline applies to all of the RTP streams that are within
      the scope of a common (coupled) congestion controller when that
      controller does not use per-RTP-stream measurements for bottleneck
      detection.

   o  Should use a single DSCP for RTCP packets, which should be a DSCP
      used for RTP packets that are or would be sent by that SSRC (see
      Section 5.4).

   o  Should use a single DSCP for all packets within a reliable
      transport protocol session (e.g., TCP connection, SCTP
      association) or DCCP connection (see Sections 5.1 and 5.3).  For
      SCTP, this requirement applies across the entire SCTP association,
      and not just to individual streams within an association.  When
      TURN selects TCP for NAT/FW traversal, this guideline applies to
      all traffic multiplexed onto that TCP connection, in contrast to
      use of UDP for NAT/FW traversal.

   o  May use different DSCPs whose corresponding PHBs enable reordering
      within a single UDP or UDP-Lite 5-tuple, subject to the above
      constraints.  The service differentiation provided by such usage
      is unreliable, as it may be removed or changed by DSCP remarking
      at network boundaries as described in Section 3.2 above.

   o  Cannot rely on end-to-end preservation of DSCPs as network node
      remarking can change DSCPs and remove drop precedence distinctions
      (see Section 3.2).  For example, if a source uses drop precedence
      distinctions within an AF class to identify different types of
      video frames, using those DSCP values at the receiver to identify
      frame type is inherently unreliable.
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   o  Should limit use of the CS1 codepoint to traffic for which best
      effort forwarding is acceptable, as network support for use of CS1
      to select a "less than best-effort" PHB is inconsistent.  Further,
      some networks may treat CS1 as providing "better than best-effort"
      forwarding behavior.

   There is no guidance in this memo on how network operators should
   differentiate traffic.  Networks may support all of the PHBs
   discussed herein, classify EF and AFxx traffic identically, or even
   remark all traffic to best effort at some ingress points.
   Nonetheless, it is useful for applications and other traffic sources
   to provide finer granularity DSCP marking on packets for the benefit
   of networks that offer QoS service differentiation.  A specific
   example is that traffic originating from a browser may benefit from
   QoS service differentiation in within-building and residential access
   networks, even if the DSCP marking is subsequently removed or
   simplified.  This is because such networks and the boundaries between
   them are likely traffic bottleneck locations (e.g., due to customer
   aggregation onto common links and/or speed differences among links
   used by the same traffic).

7.  Security Considerations

   The security considerations for all of the technologies discussed in
   this memo apply; in particular, see the security considerations for
   RTP in [RFC3550] and Diffserv in [RFC2474] and [RFC2475].

   Multiplexing of multiple protocols onto a single UDP 5-tuple via
   encapsulation has implications for network functionality that
   monitors or inspects individual protocol flows, e.g., firewalls and
   traffic monitoring systems.  When implementations of such
   functionality lack visibility into encapsulated traffic (likely for
   many current implementations), it may be difficult or impossible to
   apply network security policy and associated controls at a finer
   granularity than the overall UDP 5-tuple.

   Use of multiple DSCPs that enable reordering within an overall real-
   time communication interaction enlarges the set of network forwarding
   resources used by that interaction, thereby increasing exposure to
   resource depletion or failure, independent of whether the underlying
   cause is benign or malicious.  This represents an increase in the
   effective attack surface of the interaction and is a consideration in
   selecting an appropriate degree of QoS differentiation among the
   components of the real-time communication interaction.  See
   Section 3.3.2.1 of [RFC6274] for related discussion of DSCP security
   considerations.
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   Use of multiple DSCPs to provide differentiated QoS service may
   reveal information about the encrypted traffic to which different
   service levels are provided.  For example, DSCP-based identification
   of RTP streams combined with packet frequency and packet size could
   reveal the type or nature of the encrypted source streams.  The IP
   header used for forwarding has to be unencrypted for obvious reasons,
   and the DSCP likewise has to be unencrypted to enable different IP
   forwarding behaviors to be applied to different packets.  The nature
   of encrypted traffic components can be disguised via encrypted dummy
   data padding and encrypted dummy packets, e.g., see the discussion of
   traffic flow confidentiality in [RFC4303].  Encrypted dummy packets
   could even be added in a fashion that an observer of the overall
   encrypted traffic might mistake for another encrypted RTP stream.
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